A finite difference method for estimating second order parameter sensitivities 
of discrete stochastic chemical reaction networks 

Elizabeth Skubak Wolf 1 ' 51 ) and David F. Anderson 1 ' 12 ) 

Department of Mathematics, University of Wisconsin, Madison, Wisconsin 53706, 
USA 

We present an efficient finite difference method for the approximation of second derivatives, with respect to 
system parameters, of expectations for a class of discrete stochastic chemical reaction networks. The method 
uses a coupling of the perturbed processes that yields a much lower variance than existing methods, thereby 
drastically lowering the computational complexity required to solve a given problem. Further, the method 
is simple to implement and will also prove useful in any setting in which continuous time Markov chains are 
used to model dynamics, such as population processes. We expect the new method to be useful in the context 
of optimization algorithms that require knowledge of the Hessian. 



I. INTRODUCTION 

Stochastic models are commonly used to simulate and 
analyze chemical and biochemical systems, in particu- 
lar when the abundances of the constituent molecules 
are small and ordinary differential equations cease to 
provide a good description of system behavior. The 
most common modeling choice is to use a continuous 
time Markov chain (CTMC), which we represent via the 
stochastic equation (f), but is often described in the bi- 
ology literature via the chemical master equation, and 
simulated using Gillespie's algorithm 1 ' or the next reac- 
tion method. 3 ' 4 

Parameter sensitivity analysis is a valuable tool in this 
setting as it provides a quantitative method for under- 
standing how perturbations in the parameters affect dif- 
ferent response functions of interest. Further, often the 
only means of determining the parameters for these mod- 
els is experimentally. If the model provides a reasonable 
approximation of the system, the sensitivities can be used 
to analyze the identifiability of given parameters. ' They 
can also suggest an experimental design in which more 
resources, which are often limited, can be spent deter- 
mining the more sensitive parameters. 

While first derivative sensitivities have been much 
studied, less focus has been given to finding reasonable 
algorithms for the computation of sensitivities of higher 
order, particularly in the discrete state setting. Second 
derivative sensitivities (the Hessian), however, are also 
useful. For example, they provide concavity information 
which is necessary for finding roots or extrema of an ex- 
pectation. Additionally, in a more general optimization 
setting, the Hessian can be used to improve upon a sim- 
ple steepest-descent method. Newton and quasi-Newton 
methods, for instance, use an approximate Hessian to 
choose the direction in which to step in the next iterate 
of the optimization, using curvature to find a more direct 
path to a local minimum than can be achieved by using 



the gradient alone. When the Hessian is positive semi- 
definite, these methods achieve a fast rate of local con- 
vergence. Additionally, trust-region based optimization 
methods can also be markedly improved by including a 
Hessian estimate. h ~ 9 Developing algorithms that success- 
fully integrate these optimization methods in the chemi- 
cal reaction network and CTMC setting, for example in 
the context of parameter estimation, is a topic of current 
research which depends critically on having an efficient 
method for approximating second derivatives. 

We introduce here a method for the computation of the 
second order sensitivities of stochastically modeled bio- 
chemical reaction networks that is a nontrivial extension 
of the coupled finite difference method developed in 10. 
The proposed method produces an estimate with a sig- 
nificantly lower variance than existing methods, so that 
it requires much less CPU time to produce an approx- 
imation within a desired tolerance level. Additionally, 
the paths generated can also be re-used to compute first 
derivatives of the system for use in any optimization al- 
gorithm. While biochemical reaction networks will be 
the setting for this paper, the proposed method is also 
applicable to a wide variety of continuous time Markov 
chain models, such as those used in queueing theory and 
the study of population processes. 

The outline of the paper is as follows. In Section II, we 
precisely describe the model and problem under consid- 
eration. In Section III, we present the new method and 
give a simple algorithm for implementation. In Section 
IV, we provide several numerical examples to compare 
the new method with existing methods, including finite 
differencing with common random numbers, finite dif- 
ferencing with the common reaction path method, and 
second order likelihood transformations. Finally, in Sec- 
tion V, we provide some conclusions and discuss avenues 
for future work. 



II. THE FORMAL MODEL 



a ' Electronic mail: skubak@math.wisc.edu 
b ) Electronic mail: anderson@math.wisc.edu 



Suppose we have a system of d chemical species under- 
going M reactions, each with a given propensity function 
Xk '■ K d — > R>o (known as an intensity function in the 
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mathematical literature) and reaction vector (transition 
direction) Cfe e lR d . We can model this system as a con- 
tinuous time Markov chain using the random time change 
representation 1 1-13 



Xt — Xq 
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where the Yk are independent, unit-rate Poisson pro- 
cesses and Xo is the initial state. We assume, without 
loss of generality, that the state space S is some subset of 
Z> . That is, the abundances of the constituent species 
are always non-negative integers. Note that the chemi- 
cal master equation (forward equation in the language of 
probability) for the above model is 

d M 

-Tj.Px {x, *) = E Px °( X ~ Ck,t)\k(x - Cfc)l {x _ ffceZ | o } 

fc=l 
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where Px (x,t) is the probability of being in state x £ 
Z> at time t > given an initial condition of Xq. 

Intuitively, the random time change representation 
(1) can be understood as follows. Let Rk(t) := 

Yk (^Jq Xk(X s )ds^j . Then Rk{t) counts the number of 
times the fcth reaction has occurred up to time t, and 
Rk(t)(k is the change in the system due to these reac- 
tions. The representation (1) then shows that the process 
at time t is simply its initial value plus the total change 
up to time t due to each of the M reactions. To under- 
stand the counting processes Rk{t) — Yk ( J * Xh(X s )ds\ , 

picture a realization of the unit-rate Poisson process Yk 
as being determined by points on K>o giving the jump 
times of Yfc. For example, we could have 



X 


X 


X 


X 


X 




s 



where the "X" marks correspond to the jump times of 
the Poisson process. At time zero begin at the ori- 
gin, and as time increases, travel to the right. At 
a given time s on the line, the value lfe(s) is how 
many points we have passed up to and including s. 
For example, in the picture above, Yfc(s) = 4. The 
propensity function Xk then indicates how fast we 
travel on this line. If Xk(X s ) is very large and t\ > 
to, we expect J* Xk{X s )ds to be much larger than 

Jq° Xk(X s )ds, so that Yk ^ Jq 1 Xk(X s )ds\ is much larger 

than Yk (^J ta Xk(X s )ds^; we have traveled past many 

points along the line between times t and t\, and so were 
"moving quickly." At the other extreme, if Xk(X s ) is zero 
between to and t\, then J^ 1 Xk(X s )ds = jj Xk(X s )ds, 
so that Y k ( J* 1 A fe (A s )ds) = Y k ( / Q to X k {X s )ds) ; in this 



case we did not travel anywhere on the line, but were 
"stopped." For further information, intuition, and a 
derivation of this representation, see 12, 13, and 15. 

Now we suppose that the propensities are dependent 
on some vector of parameters 6; for instance, 9 may rep- 
resent a subset of the system's mass action kinetics con- 
stants. We then consider a family of models X t (9), pa- 
rameterized by 8, with stochastic equations 



X t (6) = X (6) + 



M 

En- 

k=l 



X k (6,X s (6))ds)( k , (2) 



Letting / be some function of interest, for example the 
abundance of some molecule, we define 

J (9) :=Ef(6,X t (0)). 

This paper is concerned with finding an efficient com- 
putational method for the approximation of the second 
partial derivatives of J, 



d 2 



J(9). 



There are several existing methods for the approxi- 
mation of such sensitivities. The likelihood ratio (LR) 
method proceeds analytically by moving the derivative 
inside the expectation. 1 '' 16 The variance of such estima- 
tors, however, are often prohibitive. In Section IV, we 
include numerical results from the LR method for com- 
parison. The general method of infinitesimal perturba- 
tion (IP) also proceeds by moving the derivative inside 
the expectation. However, IP methods do not apply 
for many stochastically modeled chemical reaction net- 
works, as the requirements of the needed analytical tools 
are typically not met. See the appendix of 17. 

Finite difference methods for approximating these 
sensitivities start with the simple observation that for 
smooth functions J, we may approximate second partial 
derivatives by perturbing the parameter vector in both 
relevant directions, so that 
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J{6) 



(3) 



_ J(6 + (gj + ej)e) - J(6 + ge) - J(9 + eje) + J(0) 
+ 0(e), 

where ej is the vector with a 1 in the i th position and 
elsewhere. Thus, for second derivatives, finite difference 
methods require up to four simulated paths to produce 
one estimate, as opposed to the LR method, which re- 
quires only one path per estimate. When coupling meth- 
ods are used with the finite difference, however, the vari- 
ance of the estimates produced are usually significantly 
lower than LR, as demonstrated in Section IV, so that fi- 
nite different methods often provide much more effective 
estimators. 

In our setting, equation (3) suggests an approximation 
of the form 
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ddjde, 



j(e) «e 



f(8,X t (8 + (a + e 3 )e)) - f(8, X t (6 + e 4 e)) - f(8, X t (8 + e 3 e)) + f(8, X t (8)) 



(4) 



The Monte Carlo estimator for (4) with R estimates is 
then 



R 

i=i 



where 
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f(8,X m (8))], 
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where, for example, X t ^(8) is the £th path simulated 
with parameter choice 8. Note that if the four relevant 
processes are computed independently, which we will 
call the Independent Random Numbers (IRN) method, 
the variance of the estimator Dn(e) is i? _1 Var((i(e)) = 
0(R~ 1 e~ 4 ). The goal of any coupling method in this 
context is to lower the variance of die) by correlating the 
relevant processes. 

We will demonstrate via example that the method pre- 
sented here lowers the variance of the numerator of d(e) 
to O(e), thereby lowering the variance of d{e) to 0(e -3 ), 
yielding Var(D fl (e)) = O^" 1 ^ 3 ). The proof of this fact 
follows from work in 10. For several non-trivial examples, 
however, the method gives even better performance, low- 
ering the variance of d{e) another order of magnitude to 
0(e~ 2 ). In contrast, every other coupling method we 
attempted v yielded an asymptotic variance for d(e) of 
0(e~ 3 ) at best, and in general were much less efficient 
than the method being proposed here. Theoretical work, 
and a discussion of conditions on the model guarantee- 
ing the faster rate of convergence, will be presented in a 
follow-up paper. 

For ease of exposition and notation, we have described 
finite differences using the forward difference (4). Our 
formal construction will also use the forward difference. 
In practice, however, it is no more difficult to use the 
central second difference, 

e- 2 [f(8, X t {8 + (a + e 3 )e/2)) - f(8, X t (9 + (e 4 - e 3 )e/2)) 
- f(8, X t (8 + (e 3 - e t )e/2)) + f(8, X t (8 - (e* + e 3 )e/2))} , 

(6) 

which has a bias of only 0(e 2 ); this is what we have 
implemented in our numerical examples. 



III. COUPLING THE FINITE DIFFERENCE 

The goal of any coupling of the finite difference is to 
reduce the variance of the estimator produced by some- 
how ensuring that the four paths needed in (3) remain 



close together. The common random numbers (CRN) 
coupling achieves this goal by reusing the uniform ran- 
dom numbers in an implementation of Gillespie's direct 
algorithm. Implicit in equation (2) is the Common Re- 
action Path (CRP) coupling, 15 which assigns a stream of 
random numbers to each Yj. which are then used to pro- 
duce the required realizations of the stochastic processes. 

The method presented here, on the other hand, forces 
the paths to share reactions; often two or more of the four 
paths have the same reaction occur at the same point in 
time. Further, the method often naturally "recouples" 
the processes during the course of a simulation. These 
facts allow the paths to remain closer than is possible 
by only sharing random numbers, and so the method 
consistently produces an estimate with lower variance, 
often significantly so. We provide numerical evidence for 
this comparison of methods in Section IV; we also briefly 
revisit this discussion at the end of Section MB. 



A. Review of first derivatives 

The main idea of the method presented here is most 
easily seen in the context of first derivatives, where only 
correlated pairs of runs are required to approximate the 
first finite difference 



~\J(8- 



J(8)). 



The main idea of the coupling presented in this paper is 
illustrated in the following toy example. Suppose we wish 
to study the difference between two Poisson processes 
Z\,Zi with rates 13.1 and 13, respectively. One way 
would be to use independent, unit rate Poisson processes 
Y\ , Y~2 and write 



Z x {t) = Yi(13.1t) and Z 2 (t) = Y 2 (13t). 



(7) 



Then E(Z x (t) - Z 2 (t)) = O.lt and Var(Zi(t) - Z 2 {t)) 
Var(Fi (13.lt)) +Var(F 2 (13t)) = 26.K. 

We would like to lower this variance: instead, write 

Zi(t) = Yi(13t) + Y 2 (0.1t) and Z 2 (t) = F x (13t). 



EY 2 (0.lt) = O.lt 
= Var(r 2 (0.1t)) = 



Then we still have E(Z x (t) - Z 2 {t)) = 
as needed, but now Var(Zi(t) — Z 2 {t)) 
O.lt instead. 

In general, we want to consider the difference of two 
processes Z\ , Z 2 with intensities A and B (the intensities 
may be functions of time, but this is not important to the 
main idea). In this case, we would write 

Z 1 =Y 1 + Y 2 and Z 2 = Y 1 + Y 3 , 
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where Y±,Y2, and Y3 are independent unit-rate Poisson 
processes that have intensities 

m := min{A, B}, A — m, and B — m, 

respectively. In other words, we have split the counting 
processes Z\ and Z 2 into three sub-processes. One of 
these, Yi, is shared between Z\ and Z%, so that at the 
time at which Y\ jumps, both of the original processes 
jump. These shared jumps lower the variance of the dif- 
ference Z\ — Zi to only A + B — 2m = \A — B\, rather 
than the A + B which would result from using only two 
processes similarly to (7) above. 

This is precisely the idea needed for the computation 
of first derivatives, as in ] I, which is discussed in Section 
III C in more detail. This idea will also serve as the basis 
for our coupling method for second derivatives. 

B. Construction of the Coupling 

For the computation of second derivites, rather than 
correlated pairs of runs, correlated quartets of runs are 
used. We suppose we have the four CTMCs of (4), with 
hj G {!)•••) Af} fixed, which for convenience of exposi- 
tion we order as 

X t {0 + + X t (6 + e t e), X t (6 + ej e), X t {6). (8) 

We also assume that their initial conditions are equal 
(i.e., they are equal at t = 0), to some value X (9). For 
each of the four processes above, there is an associated 
propensity for each of the M reaction channels. For ex- 
ample, the propensity of the fcth reaction channel of the 
first process (the one with parameter choice 9+(ei + ej)e) 
is 

A fe ,i := A fe (6> + (ej + e 3 )e, X t (0 + (e ( + e^e)). 

Similarly rename the propensity of the fcth reaction chan- 
nel of the second process (parameter choice 9 + ae) by 
Afc, 2, the third process as Afc, 3, and the fourth process as 
Afc, 4, as P er our ordering (8). Note that these propensities 
are dependent on 9 and X t (6), but we will drop either 
or both of these dependencies in our notation when they 
are not relevant to the current discussion. 

Next, we introduce a coupling of these four processes 
that will produce an estimator (5) with low variance. The 



main idea is similar to that in Section III A as well as in 
in that it rests on splitting a counting process into 
sub-processes, to be shared among the four CTMCs (8). 
With this goal in mind, we create a sub-process to allow 
the 1st and 2nd processes of (8) to jump simultaneously, 
one to allow the 1st and 3rd to jump simultaneously, one 
for the 2nd and 4th, and one for the 3rd and 4th. Ad- 
ditionally, we create a sub-process that allows all four to 
jump simultaneously. As in the first derivative setting, 
the rates of these sub-processes will involve minimums of 
the original CTMCs. Finally, we also require four addi- 
tional sub-processes to make up any "leftover" propensity 
of the original CTMCs. 

Formally, define i?fe l [b 1 .6 2! 6 3l b 4 ] as a counting process, 
where bg g {0, 1}. A jump of -R/ C ,[6 1 .b 2 .6 3 ,6 4 ] indicates that 
the ^th process in the ordering (8) jumps by reaction 
fc if and only if be — 1, for I £ {1,2,3,4}. For example, 
■Rfc. [1,1,0,0] (t) counts the number of times the fcth reaction 
has fired simultaneously for the first and second processes 
of (8) (but the third and fourth did not fire), whereas 
-Rfc,[i,o,i,o] (t) counts the number of times the fcth reaction 
has fired simultaneously for the first and third processes 
of (8) (but the second and fourth did not fire). Define 

the propensity of #&,[6i, 62,63,64] by ^,[61,62,63,64]) so tnat 
in the random time change representation (2), 

^,[61,62,63,64] (*) — ^,[61,62,63,64] (^J Afe,[6i,b 2 ,6 3 ,64]( s )^ s ^ 

where the Y's are independent unit-rate Poisson pro- 
cesses and where the propensities are 

A*,[i, 1,1,1] = A/^i A Xk,2 A Xk.3 A XkA 

Afc,[i,i,o,o] = Afe.i A Xk,2 - A-k,[i,iA,i] 

Afc, [0,0,1,1] = Afc,3 A Xk t A — ^-k,[i,i,i,i] 

A-fc,[i,o,i,o] = (Afe,i — Afe,i A Afc_2) A (Afe,3 — Xk.3 A Xk.4) 

Afc,[o,i,o,i] — (Afc,2 — Afc,i A Afc i2 ) A (Afc j4 — A A/^4) 

(10) 

Afc, [1,0,0,0] = (Afe,i - Xk,i A A fci2 ) - A fc ,[1,0,1,0] 
Afc, [0,1,0,0] = (A/c, 2 - Afe,i A A fc , 2 ) — Afc, [0,1,0,1] 
Afc, [0,0, 1,0] = (Afc, 3 ~ Afc, 3 A Afc, 4) — Afc, [1,0, 1,0] 
Afc, [0,0,0,1] = (Afc,4 — Afc, 3 A Afc, 4 ) — A fc , [0,1,0,1], 

where we define the notation a A b := min{a, b}. The 
proposed coupling is then given by the following: 
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X t (9 + (e, + ej)e) — X (9) + ^ Cfc(-Rfc,[i,i,i,i](*) + ^*,[i,i.o,o](*) + #fe,[i,o,i,o](*) + ^fe,[i,o,o,o](*)) 

fe 

X t (0 + EiE) = X Q (9) + Cfc(-Rfc,[i,i,i,i](*) + ^4,[i, 1,0,0] (*) + ^4,[o,i,o,i] CO + -Rfe,[o,i,o,o](*)) 
fe 

,[0,0,1,1] (t) + Rk, [1,0,1,0] (t) + ^fe,[o,o,i,o] (t)) 

k 

X t (9) = X Q (9) + Cfe(^fe,[i,i,i,i] (*) + ^,[0,0,1,1] W + Rk ,[0,1,0,11(0 + Rk, [0,0,0, !](*))■ 



A few comments are in order. First, note that, for ex- 
ample, the marginal process X t (9 + (e^ + ej)e) above in- 
volves all the counting processes in which 6 X = 1. Second, 
each of these marginal processes X t {-) have the same dis- 
tribution as the original, uncoupled, processes since the 
transition rates of the marginal processes have remained 
unchanged. This can be checked by simply summing the 
rates of the relevant counting processes, which are all 
those Afejfjj ^^{k, fiA in which a given bi = 1. Third, if / 
is linear, for example if we are estimating the abundance 
of a particular molecule, many of the Rk,[b u b 2 ,b 3 ,bi] are 
completely cancelled if we now construct the difference 
(4). An example of this will be shown in Section IV A. 
Fourth, even if i = j, the coupling requires two different 
copies of the process X t (9 + e^e), one taking the role of 
X t (9 + e^e, t) and the other X t (9 + eje). 

As discussed at the beginning of this section, the CRN 
and CRP methods attempt to reduce the variance of 
the estimator (5) by reusing random numbers for each 
of the four nominal processes. However, as discussed in 
10 in the setting of first derivatives, this will often lead 
to a decoupling over long enough time periods. Hence, 
the variance of the CRN and CRP estimators will often 
eventually converge to a variance of the same order of 
magnitude as the estimator constructed using indepen- 
dent samples. This behavior is demonstrated by exam- 
ple in the current setting of second derivatives in Section 
IV. The double coupled method presented here re-couples 
the four relevant processes every time they are near each 
other, which, by contrast, does not occur in cither CRN 
or CRP. We refer the interested reader to 10, Section 3.1 
for a more thorough discussion of this idea. 



I 

in (8) are constructed as: 

X(6 + (e* + ej )e, t) = X (6) + ^ + Hfc,[i,o]) Cfc 

k 

X{6 + e <e , t) = X Q (9) + ^ {Rk,\i,i) + R k,[o,i]) Ch, 

k 

(12) 

where R k ,[b u b 2 ] = Y k,[b u b 2 ] (jo ^k,[b u b 2 ] (s)da\ are de- 
fined analogously to (9) and where 

Afc,[i,i](s) = A fc ,i(s) A Afc, 2 (s), 
Afc,[i, ](s) = A fe ,i(s) - A M (s) A A fe , 2 (s), 
A fc,[o,i]( s ) = A fc , 2 (s) - Afe,i(s) A Afc, 2 (s)- 

As in (11), the processes defined in (12) jump together 
as often as possible: they share the sub-processes Rk,n,i], 
each of which runs at a propensity equal to the mini- 
mum of the respective propensities of the two original 
processes. We then expect the variance of the first finite 
difference [/((?, X t {9 + (e, + e,)e)) - f(0, X t (9 + e l e))] e - 1 
to be small since the two processes of (12) will remain 
approximately the same whenever they jump simultane- 
ously via iZfc,[i,i]- 

Now note that, together, the two processes (12) can be 
viewed as a new CTMC with dimension 2d, twice that 
of that of the original process. The third and fourth 
processes in (8) can be similarly coupled, giving us two 
2d-dimensional CTMCs. Finally, we couple these new 
processes into a single CTMC of dimension Ad, in pre- 
cisely the same manner of 10. This construction leads to 
the same process as given in (11). The details are left to 
the interested reader. 



C. An Alternative Derivation 

The coupling described in the previous section can 
be derived in an alternate way, which explains why the 
method is termed "double coupled." We could first cou- 
ple the first and second processes of (8) using the coupled 
finite difference method, and then couple the third and 
fourth in the same manner. For example, using the 
as defined in the previous section, the first two processes 



D. Algorithms for simulation of (11) 

We present two algorithms for the pathwise simulation 
of the equations (11). The first corresponds to the next 
reaction method of 4, whereas the second corresponds 
to an implementation of Gillespie's direct method. 1 ' 2 As 
usual, it will be problem specific as to which algorithm 
is most efficient. 

Below, rand(0,l) indicates a uniform[0,l] random vari- 
able, independent from all previous random variables. 
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Recall that if U ~ rand(0,l), then ln(l/£/)/A is expo- 
nentially distributed with parameter A > 0. Also recall 
that even if i and j are equal, the processes X(9 + e^e) 
and X{9 + e 3 t) are still constructed separately. Define 
the set 

B:={[1, 1,1,1], [1,1, 0,0], [0,0, 1,1], [1,0, 1,0], 

[0, 1, 0, 1], [1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]} 

and note that it will often be convenient to use a for loop, 
from 1 to 9, to enumerate over the vectors in B. 

Algorithm-modified next reaction method ap- 
plied to (11). 

Initialization: Set X(9 + (e, + e 3 )e) = X(9 + e,e) = 
X(9 + eje) = X{9) = X and t = 0; for each k G 
{1, ...,M} and each b £ B, set Tk.b — and Pk.b — 
ln(l/ti fe)b ) for u k , b ~ rand(0,l). 

Repeat the following steps: 

(i) For each k, set 

Ajk.i = A fe (6» + (e» + ej )e, X{9 + (e< + e^e)) 
A fe ,2 - X k (9 + e l e,X(9 + e l e)) 
Afe.s - X k (9 + e 3 e,X(9 + e 3 e)) 
X kA = X k {9,X{9)) 

and use to set each of the nine variables A kl b as 
above in (10). 

(ii) For each k and b £ B, set 

a . / (Pit,* - T M )/A M , if A fc . b > 
I oo , else 

(iii) Set A = min k ^{At k _b} and let /i :— k and v := b = 
[bi, 62 j &3, ^4] be the indices where the minimum is 
achieved. 

(iv) Set t = t + A. 

(v) Update state vector variables X(9+(ei+ej)e), X(6+ 
eic),X(6+eje),X(9) by adding ^ to the Ith process 
if and only if bi = 1 in 2/. 

(vi) For each k and b £ B, set T feib = T fe , b + A • A fe)b . 

(vii) Set P M ,^ = P Mi „ + ln(l/u) where u ~ rand(0,l). 

(viii) Return to (i) or quit. 

Algorithm- Gillespie's Direct Method Applied 
to (11). 

Initialization: Set X(9 + (a + e 3 )e) = X(9 + e^e) = 
X(9 + e 3 () = X{9) = X and t = 0. 

Repeat the following steps: 



(i) For each k, set 

A fe ,i = X k (9 + (a + e 3 )e, X{9 + (e 4 + e 3 )e)) 
X k: 2 = X k {0 + e l e,X(9 + e l e)) 
X k: s = X k (9 + e j e,X(9 + e j e)) 
X kA = X k {9,X{9)) 

and use to set each of the nine variables A k ,b as 
above in (10). 

(ii) Let A = J2 k J2 b A k ,b and u ~ rand(0, 1), and set 

A = ln(l/tt)/Ao. 

(iii) Set t = t + A. 

(iv) Let u ~ rand(0, 1) and use to select (//, v) £ {(k, b) : 
k £ {1, ...,M},b £ B} where each pair (k,b) is 
selected with probability Afc^/Ao- 

(v) Update state vector variables X(9+(ei+ej)e), X(9+ 
e^e), X(9+eje), X(9) by adding to the £th process 
if and only if bg = 1 in v. 

(vi) Return to (1) or quit. 

IV. NUMERICAL EXAMPLES 

In this section, we compare the double coupled method 
with the following existing methods: 

(a) the usual Independent Random Numbers (IRN) es- 
timator in which the processes of (4) are simulated 
independently, also referred to as the crude Monte 
Carlo method, 

(b) the common random numbers approach (CRN) in 
which the processes of (4) are simulated given the 
same stream of random numbers using Gillespie's di- 
rect algorithm, 

(c) the Common Reaction Path (CRP) method of 19 in 
which the processes of (4) are coupled by reusing each 
Y k of (2), 

(d) the double coupled method proposed here (CFD2, 
where the CFD stands for "coupled finite difference" ) 
which implements the coupling (11), 

(e) a Girsanov transformation or likelihood ratio method 
(LR) in which the computed weight function is used 
as a control variate (see 16, and 6 Chapters V.2, 

vn.3). 

All methods except (b) were simulated using the next 
reaction algorithm, modified as necessary. We also note 
that the first four methods use the second finite differ- 
ence, which has some bias (see Section II); recall that to 
reduce this bias we actually simulate the centered differ- 
ence (6), which is accomplished in the same way as the 
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forward difference but with the parameters shifted. The 
LR method is the only one of the four methods we use 
here that is unbiased; its high variance, however, typ- 
ically makes the method unusable. Finally, when dis- 
cussing performance, we will refer to £ of (5) as the 
number of estimates. 



A. A Simple Birth Process 

Consider a pure birth process A —> 2 A. Here, £ = 1, 
and denoting by X t the number of A molecules at time 
t, we assume a propensity function X(9, X t (9)), so that 
in the random time change representation, 

X t {9) = X Q + Y (J \(6,X s (0))ds) , 



where, as usual, Y is a unit-rate Poisson process. 

Suppose we are interested in the second derivative of 
KX t with respect to 9 (so that f(0,x) — x). We dou- 
ble couple the processes as in (11), noting that we are 
in the special case when i — j. This does not change 
the main idea of the double coupling, but it requires us 
to distinguish the two nominal processes with the same 
parameter value 9 + e; we label them as X}{9 + e) and 
Xf(9 + e). Ordering as in (8), and noting that since there 
is only one reaction we may drop the subscript k, we find 
that 

Ax = X(9 + 2e,X t (9 + 2e)) 
\ 2 = \(9 + e,Xl(9 + e)) 
A 3 = \(9 + e,X?(9 + e)) 
A 4 = A(0,X t (0)), 

and use these to define the A's as given in (10). The 
double coupled processes are then given as 



X t {9 + 2e) = X {9) + (t) + £[i,i,o,o] (*) + £[1,0,1,0] (*) + £[i,o,o,o] (*) 

X}(8 + c) = X {9) + £[1,1,1,1] (f) + £[1,1,0,0] (*) + %i,o,i](*) + %i,o,o](*) 
Xf(9 + e) = X {9) + £[1,1,1,1] (t) + £[o,o,i,i] (*) + £[1,0,1,0] (*) + £[o,o,i,o]0) 
X t {9) = X (9) + £[1,1,1,1] (i) + £[0,0,1,1] (t) + £[0,1,0,1] (*) + £[o,o,o,i] (*)• 



Now that we have coupled the processes, note that when 
we consider the second difference (4) for the given /, 
which is linear, most of the sub-processes cancel. For ex- 
ample, since £[1,10,0] is present in both X t (9 + 2e) 7 which 
is positive in the difference, and in X^(9 + e), which is 
negative, £[1,1,0,0] is n °t present in the second difference. 
One can easily check that the numerator of the difference 
(4) simplifies in this case to 



X t (9 + 2e) -X\ 

= £[1,0,0,0] (*) - 



{9 + e)-X 2 t {9 + e)+X t {9) 

£[0,1,0,0] (*) - £[0,0,1,0] (*) + £[0,0,0,1] (*)■ 

(13) 



Note that the rates of the four remaining counting pro- 
cesses of (13) are usually relatively small; in fact, at any 
given time at least two of the four must have zero propen- 
sity, as can be seen by considering the possible values of 
the minima involved. 

Suppose that X(9, X t (9)) — 9X t (9) is simply a constant 
times the population at time t. We choose to estimate 
at t = 5 and 9 = 1/2, with X (9) = 1. We use 



d 2 EX t 
d6 2 



e = 1/50 for the finite difference methods. For simple 
examples such as this, one can solve for the derivative 
explicitly; in this case the actual value is 304.6. 

As can be seen in the data in Table I, manifested in 
the width of the confidence interval, the variance of the 
double coupled estimator is smaller than that of the esti- 



Method 


Estimates 


Approximation 


# updates 


CPU time (s) 


IRN 


100,000 


307 ± 447 


w 3.7 x 10 6 


38 


CRP 


100,000 


315 ± 24 


w 3.7 x 10 6 


49 


CRN 


100,000 


282 ± 24 


w 3.3 x 10 6 


32 


LR 


100,000 


311 ± 20 


w 1.1 x 10 6 


37 


CFD2 


100,000 


296 ± 12 


w 1.2 x 10 6 


22 



TABLE I. 95% confidence intervals and computation time for 
each of the five methods (a) through (e), after 100,000 esti- 
mates, on the simple birth model of IV A (with linear propen- 
sity). An e of 1/50 was used for the three finite difference 
methods. Actual value: 304.6. 



mators given by the other methods. For instance, for the 
same number of estimates it gives a confidence interval 
of half the width of the CRP and CRN methods, which 
for this single-reaction model, though implemented differ- 
ently, give equivalent estimators. Here and throughout, 
confidence intervals are constructed as ±1.96^/v where v 
is the variance of the estimator (5). 

For each method, we also include the CPU time that 
was required for the simulation, as well as the number of 
updates made to the system state (the number of times a 
reaction vector is added to the state vector). The latter 
is a useful comparison tool, as it provides a measure of 
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the amount of work the method requires, but is not in- 
fluenced by differences in implementation (such as use of 
Gillespie vs next reaction algorithms). These differences, 
on the other hand, often affect CPU time. We do not 
also provide a random number count for each method, 
but note here that except for CRN this number is equal 
to the number of system updates. For CRN, which uses 
Gillespie's algorithm, two random numbers are used per 
system update. Finally, the CPU time will certainly vary 
by machine; all tests described in this section were run 
in MATLAB on a Windows machine with a 1.6 GHz pro- 
cessor. 



of gene transcription and translation, mRNA is being 
created, and then translated into protein, while both the 
mRNA and the protein may undergo degradation (where 
here the constants arc in the sense of mass action kinet- 
ics, so for example protein is being created at a rate of 7 
times the number of mRNA molecules) : 



I^m4m + p, p -4 0. 

e 



B. mRNA Transcription and Translation 

We now examine the performance of the proposed 
method on a more realistic model. In the following model 



We assume initial concentrations of zero mRNA and 
protein molecules. The stochastic equation for this model 



X(0, t) = Y 1 (2t) (D+Y2U OX M (0, s )d S ^j ( -i) + Y 3 (J jX M (6, s)ds ) j ( ° ) + Y 4 (J X P (6, s)ds\ ( ^ ) 



where X = ( X M ) gives the numbers of the mRNA and 
protein molecules respectively. Note that we have moved 
the parameter t from the subscript for notational conve- 
nience. 

In subsection IV B 1, we compute the second derivative 
of the expected number of protein molecules with respect 
to 9, while in subsection IV B 2, we compute the mixed 
partial of this same quantity with respect to both 9 and 7. 
In subsection IV B 3, we compute the second derivative of 
the square of the expected number of protein molecules 
with respect to 9. 



1. 2 nd derivative of protein abundance with respect to 8 

Suppose we would like to estimate the second deriva- 
tive of the expected number of protein molecules with 
respect to 9 at a time of t = 30 and 9 — \ . Additionally, 
we fix 7 = 10 and Xq = 0. One can analytically find that 
J^EX F (30) = 2496. 

First, Table II gives simulation data as in the previous 
examples, with two different perturbations, e, of 9 used. 
Note the trade-off between bias and precision: a larger 
epsilon implies the second finite difference has a larger 
bias, but, since there is an e 2 in the denominator of the 
estimator, the variance of the estimator is smaller; for 
small epsilon it is vice- versa. Table III shows the relevant 
data for the LR method. 

Perhaps more illustrative is Table IV, which compares 
the numbers of estimates and system updates as well as 
the time required to achieve a 95% confidence interval 



I 

of a set width. These data give a good idea of the effi- 
ciency of the methods, as often one desires the estimate 
within a given tolerance. We can see that the double 
coupled method is approximately 25 times faster than 
CRP, 73 times faster than the often used CRN method, 
over 100 times faster than the LR method, and over 125 
times faster than IRN. Note also that the double coupled 
method requires drastically fewer estimates to achieve the 
same confidence, so that, even though the computation 
of one double coupled estimate requires more time than 
most of the other methods, as can be seen in Table II, 
the lower variance leads to very large time savings. 

Finally, in Figure 1 we include a plot of the variance of 
the different estimators versus time. Note that the scales 
on the plots are very different. The plots correspond- 
ing to finite difference methods all appear to converge; 
the limiting value for the double coupled method, how- 
ever, is over 20 times smaller than the CRP method, and 
over 170 times smaller than the CRN and IRN methods. 
Note also that, as time increases, the CRN variance tends 
to the same value as the IRN method; this is expected, 
since we expect the processes to decouple. The variance 
for CRP behaves similarly, converging to a number of 
approximately the same order of magnitude as the IRN 
method, though the value itself is significantly lower in 
this four-reaction model. The plot for the LR method 
scales quadratically, as is expected by the form of the 
estimator (see Chapter VII. 3 of 6). This shows that, for 
moderate and large times, the double coupled method 
quickly becomes much more efficient then the other esti- 
mators. 
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Method 


Estimates 


1 /on 

e = 1/20 


E = 1/100 


^ updates 


UrU time (sj 


CRN 


1,000 


2682 ± 1192 


5950 ± 19123 


« 1.26 x 10 7 


46 


CRP 


1,000 


2758 ± 569 


-2630 ± 9268 


« 1.27 x 10 7 


70 


CFD2 


1,000 


2655 ± 129 


2640± 1001 


« 4.68 x 10 6 


48 


CRN 


10,000 


2453 ± 369 


1505 ± 6120 


« 1.27 x 10 s 


457 


CRP 


10,000 


2783 ± 179 


2627 ± 2937 


« 1.27 x 10 s 


672 


CFD2 


10,000 


2601 ± 40 


2352 ± 282 


« 4.68 x 10 7 


483 


CRN 


40,000 


2386 ± 188 


1069 ± 2984 


« 5.07 x 10 s 


1829 


CRP 


40,000 


2745 ± 89 


3593 ± 1468 


« 5.07 x 10 8 


2739 


CFD2 


40,000 


2582 ± 20 


2512 ± 147 


« 1.87 x 10 s 


1931 



TABLE II. 95% confidence intervals for each of the finite difference methods (b), (c), and (d) for the computation in the mRNA 
and protein model of subsection IV B 1. Note that the bias of the second finite difference can be seen when e = 1/20 (the 
actual value is 2496). Also note that, though for a fixed number of estimates the CFD2 method is not the fastest method, it 
achieves a much smaller confidence interval. The number of updates and computational time for a fixed number of estimates 
are essentially independent of e and so the reported values, here and throughout, are the average of the values for the two 
choices of e. 



Estimates 


Approximation 


# updates 


CPU time (s) 


1,000 


2150 ± 2258 


« 4.20 x 10 6 


14 


10,000 


2429 ± 729 


« 4.19 x 10 7 


135 


40,000 


2176 ± 404 


« 1.68 x 10 8 


540 



TABLE III. 95% confidence intervals for the LR method (d) 
for the computation in the mRNA transcription model com- 
putation of subsection IV Bl. Even though this method is 
fastest per estimate, note that the variance (and so the width 
of the confidence interval) is large. 



Method 


Estimates 


Approximation 


# updates 


CPU time (s) 


LR 


495,000 


2506 ± 120 


« 2.1 X 10 9 


6619 


IRN 


190,000 


2617 ± 120 


fa 2.4 x 10 9 


7657 


CRN 


98,100 


2572 ± 120 


« 2.6 x 10 8 


4489 


CRP 


22,200 


2532 ± 120 


« 2.8 x 10 8 


1533 


CFD2 


1150 


2565 ± 120 


« 5.8 x 10 6 


61 



TABLE IV. Required estimates, updates, and computational 
time needed for 95% confidence intervals of ± 120 for all 
five methods on the computation of the mRNA transcription 
model computation of subsection IV Bl. An e of 1/20 was 
used for the finite difference methods. 



2. Mixed partial of protein abundance 



We compare the five methods in the estimation of 
j^gEX P (30) at = 1/4 and 7 = 10, which can be cal- 
culated exactly to be -31.8. Table V shows the approxi- 
mations and computational complexity of these methods 
using 5,000 estimates. 

Note that in this example, the LR method outperforms 
all methods, except CFD2, with respect to computation 
time. Thus, for comparison, we have also included the 
results of a test using the LR method in which the CPU 



Method 


Estimates 


Approximation 


# updates 


CPU time (s) 


IRN 


5,000 


607 ± 923 


« 8.43 x 10 7 


264 


CRN 


5,000 


191.5 ± 330 


« 8.42 x 10 7 


273 


CRP 


5,000 


21.0 ± 96 


« 8.41 x 10 7 


365 


CFD2 


5,000 


-33.8 ± 4 


« 2.25 x 10 7 


238 


LR 


5,000 


-15.4 ± 113 


« 2.10 x 10 7 


73 


LR 


17,000 


-61.8 ± 68 


« 6.72 x 10 7 


234 



TABLE V. 95% confidence intervals and computational com- 
plexity for all five methods, after 5, 000 estimates, for the 
computation of the mixed partial derivative in the mRNA 
transcription model as in subsection IV B 2. An e of 1/25 was 
used for the finite difference methods. Additionally, results 
from the LR method with CPU time approximately that of 
CFD2 are included for comparison. Actual value: -31.8. 



time is approximately the same as CFD2; note that the 
confidence interval for the CFD2 method is much smaller. 
Figure 2 shows variance plots of the CRN and CRP, and 
CFD2 methods over time in simulation. 



3. 2 nd derivative of the square of protein abundance with 
respect to 9 

We also calculate, from the mRNA transcription model 
of Example 4.1, ^E(X P (t) 2 ) at t = 5 and 6 = \, with 
7 = 10 and Xq — 0. Note here we are considering a 
function / of the state space which is non-linear. 

In Figure 3, we plot the log of the variance of the 
numerator of the estimator (6) versus the log of ep- 
silon. Since we expect, for the double coupled CFD2 
method, that this variance V(e) should scale like Ce p 
for some constants C and p, we see that the slope of 
log(V(e)) = log(C) + plog(e) from our simulations will 
suggest the value of p. This plot suggests that p = 2; 
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IRN, CRN, & CRP, 10,000 estimates,e=1/20 
3 x 10 4 




500 



,x 10 



.1 4 



o 3 



5 2 



50 100 150 

time in simulation 

CFD2, 10,000 estimates, £=1/20 



200 




100 150 
time in simulation 



LR, 10,000 estimates 



200 




50 100 150 

time in simulation 



200 



FIG. 1. Variance versus time of the estimators of the five 
different methods, applied to the calculation of J^EXp(#, t) 
in the mRNA transcription model of subsection IV B 1. Note 
that the scales are vastly different. 



since the numerator of the estimator is then divided by 
e 2 in d(e), this suggests a final variance of 0(R~ 1 e~ 2 ) for 
the estimator (5) as discussed in Section 2. 

For comparison, the slope of this log-log plot for the 
IRN method is zero, as the variance of the numerator 
does not depend on epsilon, giving a final variance of 
0(R~ 1 e~ 4 ). The slopes for the associated log-log plots 
for the CRN and CRP estimators will vary with time 
(discussed further in subsection IV D 1). 

The general behavior of the variances over time for the 
IRN, CRN, and CRP methods can be seen in Figure 4. 



12 
10 

8 
6 



x 10 



4 CRN & CRP, 5,000 estimates,£=1/25 



CRN 
-CRP 



« 4- 



Lu 



50 100 150 

time in simulation 



CFD2, 5,000 estimates, £=1/25 



200 




100 

time in simulation 



200 



FIG. 2. Plots of variance over time for the CRN and CRP 
methods, and the CFD2 method, in computing the mixed par- 
tial derivative of the mRNA transcription model of subsection 
IV B 2. Note the very different scales. For comparison, the 
IRN method plateaus at a variance of approximately 5 x 10 6 . 



C. Quadratic Decay 



In order to demonstrate that the 0(e 2 ) convergence 
rate seen in the previous examples does not universally 
hold, we consider a pure decay process of a population 
X t , so that the sole reaction has £ = — 1 and quadratic 
propensity \(O,X t (0)) = 9X t (9){X t (9) - 1), and cal- 
culate - E {^)2^ with 9 = 1 and with initial population 
X (8) = 2000. Figure 5 gives a log-log plot of variance 
versus epsilon at time 0.001. Since it suggests p = 1, 
this demonstrates that, in this case, the double cou- 
pled method provides only 0(R~ 1 e~ 3 ) convergence as 
discussed in Section II, showing that rate to be sharp. 

As demonstrated in Table VI and in Figure 6, however, 
the double coupled method is still significantly more ef- 
ficient than existing methods on this model. 
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Method 


e 


Estimates 


Approximation 


# updates 


CPU time (s) 


LR 


n/a 


10,000 


1240 ± 1070 


w 1.3 x 10 7 


9 


IRN 


1/20 


10,000 


555 ± 218 


« 4.8 x 10 7 


35 


CRN 


1/20 


10,000 


585 ± 52 


w 4.0 x 10 7 


30 


CRP 


1/20 


10,000 


584 ± 52 


« 4.0 x 10 7 


30 


CFD2 


1/20 


10,000 


592 ± 5 


w 1.4 x 10 7 


90 


CRN 


1/20 


272,000 


588 ± 10 


w 1.1 x 10 9 


813 


CRP 


1/20 


271,000 


589 ± 10 


« 1.1 x 10 9 


862 


CFD2 


1/20 


1,950 


592 ± 10 


« 2.7 x 10 6 


17 


CRN 


1/50 


169,500 


543 ± 50 


w 6.8 x 10 s 


511 


CRP 


1/50 


169,000 


515 ± 50 


« 6.8 x 10 s 


510 


CFD2 


1/50 


1,800 


605 ± 50 


w 2.5 x 10 6 


16 



TABLE VI. Estimates, e used, and updates and computational time needed for the given 95% confidence intervals for all 
methods for 8 at t — 0.001 for the quadratic decay model of subsection IV C. The upper half of the table shows the 

relevant results after the simulation of 10,000 estimates. The lower half of the table shows the results of simulations run until 
the estimate had a confidence interval of a desired width. The IRN and LR methods were unable to achieve these precisions 
due to memory constraints. Note again the equivalence of the CRN and CRP methods on a single reaction model. 




FIG. 3. This log-log plot of variance versus epsilon (100,000 
estimates) for the mRNA transcription model computation 
of subsection IV B 3 suggests that the CFD2 method gives an 
estimator of 0(e -2 ) even though the function / of the system 
state is non-linear: the slope of the best fit line is 1.98. 



FIG. 4. Plot of variance over time for 5,000 estimates of the 
IRN, CRN, and CRP methods for the mRNA transcription 
model computation of subsection IV B 3. The variance of the 
CFD2 method is too small to be seen at this scale; at time 
200 it is approximately 3.5 x 10 8 . 



D. Genetic Toggle Switch 

Finally, we consider a model of a genetic toggle switch 
that also appeared in 19 and 10, 



where 

A x (i) = 



A 



i + x B {ty 



D 



and X 2 (t) = — 



X A (t) a 



and where X^ii) and Xs(t) denote the number of gene 
products from two interacting genes. Note that each gene 
product inhibits the growth of the other. 



We take parameter values of b = 50, j3 = 2.5, a = 16 
and will differentiate with respect to a. Note that this 
model does not follow mass action kinetics, or have linear 
propensities. In subsection IV D 1, we consider a second 
derivative of EXb at a fixed time, while in subsection 
IV D 2, we consider a second derivative of the expected 
time average of Xa up to a given time, which is a func- 
tional of the path of Xa rather than simply Xa at some 
terminal time. 
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FIG. 5. This log plot of variance versus epsilon (each point 
computed to the first of 300,000 estimates or a confidence of 
±10) for the decay model of subsection IV C suggests that the 
CFD2 method gives an estimator of only 0(e -3 ): the slope 
of the best fit line is approximately 0.97. While the CRN and 
CRP methods also give an estimator of this same rate (the 
slope of the best fit lines are both «.99, and in fact the lines 
are on top of each other), the variance of the estimates from 
CRN and CRP are significantly higher than those from the 
CFD2 method, as can be seen by the wide gap between the 
above curves. 



1. 2 nd derivative of abundance of B with respect to a 

We estimate d E ^ B ( a '*) at a = 1 and at two times, 
5 and 400. In Figure 7, we plot the log of the variance 
of the numerator of the estimator (6), using CFD2, ver- 
sus the log of the perturbation epsilon. As in subsection 
IV B 2, the plot clearly suggests that p = 2. We also plot 
the same quantity using CRP and CRN. These slopes, 
on the other hand, vary with time. For small times both 
slopes are close to one, but as time increases the slopes 
decrease, until, for very large times, they are close to zero. 
This corresponds with the fact that for large times the 
variances of the CRP and CRN estimates converge to val- 
ues on the order of the IRN estimate variance, which, as 
previously noted, is independent of the value of epsilon. 
The general behavior of the variances over time can be 
seen in Figure 8, where it is seen that CFD2 has a vari- 
ance that is 16 times lower than CRN and 36 times lower 
than CRP. Further, we note that for this model the CRP 
method outperforms the CRN method for small times, 
while for larger times CRN outperforms CRP. 



2. 2 derivative of time average of abundance of A with 
respect to a 

Finally, while this was not discussed in the paper, we 
include an example computing a sensitivity of a path 
functional. That is, the quantity we wish to study is 
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FIG. 6. The behavior over time of the variance of the esti- 
mates of the CRN, CRP, and CFD2 methods on the quadratic 
decay model of subsection IV C. Note that the variance for the 
CFD2 method is 100 times smaller than the other two meth- 
ods, which, as expected, act the same on this model. An e of 
1/20 was used and 10,000 estimates were run. The plot of the 
IRN variance is similar in shape but with a peak variance of 
3.1xl0 4 . 



a function of the path of the process X(s) for s < t, 
rather than just the terminal value X(t). The only dif- 
ference in implementation is the need to compute this 
quantity during the simulation of the path (or to store 
the path for the computation after its simulation) . Table 

VII shows the estimates of gjp-E t^ 1 f Q Xa(s)cIs at t = 30 
using the various finite difference methods, demonstrat- 
ing the advantage of the double coupled method for these 
path functional quantities as well. Additionally, Figure 
9 shows that the overall behavior of the variances of the 
three finite difference methods remains the same as in 
the previous examples. 
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FIG. 7. These log-log plots (25,000 estimates) of variance 
versus epsilon for computation on the gene toggle model as 
in subsection IV Dl, at two different times, suggest that the 
double coupled method gives an estimator of 0(e~ 2 ) even 
though two of the intensities are nonlinear: the slope of the 
best fit line for the CFD2 method is approximately 2 (=1.97) 
at both times. The slope for the CRP and CRN methods, on 
the other hand, are approximately .74 and .90 respectively at 
time 5, but are only around .03 and .49 at time 400. 



Method 


Estimates 


Approximation 


CPU time (s) 


IRN 


100,000 


-13.8 ± 621 


2240 


CRN 


100,000 


-274 ± 146 


1441 


CRP 


100,000 


-215 ± 107 


3035 


CFD2 


100,000 


-222 ± 26 


2722 



TABLE VII. 95% confidence intervals and computational 
complexity for each of the methods (a) through (d), after 
100, 000 estimates, for the time average computation at t = 30 
on the gene toggle model of subsection IV D 2. An e of 1/50 
was used. 
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FIG. 8. Plots of variance over time for 10,000 estimates of 
the finite difference methods for the gene toggle model as in 
subsection IV D 1; the top includes time up to 200, while the 
bottom provides a close-up view of the plot for times less 
than 10. The value of the CFD2 variance at time 200 is ap- 
proximately 4,000, while the CRN variance is approximately 
65,000. 



V. CONCLUSIONS AND FUTURE WORK 

We have introduced a new, efficient method for the 
computation of second derivative sensitivities for discrete 
biochemical reaction networks. Through several numer- 
ical examples we have demonstrated its advantage over 
existing methods, both in simple scenarios and in more 
realistic systems, including several examples in which the 
system contained nonlinear propensities, or in which the 
relevant quantity to be studied involved a nonlinear func- 
tion / or even a path functional of the system state. Fu- 
ture work will include proving analytical bounds on the 
variance of the estimator given by the new method and 
exploring conditions in which a better convergence rate is 
achieved, as well as finding efficient algorithms to simul- 
taneously compute all of the second order sensitivities of 
models with a large number of parameters. Another av- 
enue of future work will involve incorporating algorithms 
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FIG. 9. At top, a plot of variance over time for 5,000 esti- 
mates for the finite difference methods for the path functional 
computation on the gene toggle model in subsection IV D 2. 
The value of the CFD2 variance at time 200 is approximately 
19,000, while the CRN variance is approximately 282,000. At 
bottom, a log-log plot (2,000 estimates) of variance versus ep- 
silon for this computation suggests that the double coupled 
method gives an estimator converging faster than 0(e -3 ) in 
this computation as well: the slope of the best fit line for the 
CFD2 method is approximately 1.78. 



for the computation of second derivatives into the op- 
timization methods discussed in the introduction in the 
context of parameter estimation. 
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